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Abstract 



We give a new, elementary proof of a key inequality used by Rudelson in the 
derivation of his well-known bound for random sums of rank-one operators. Our 
approach is based on Ahlswede and Winter's technique for proving operator Chernoff 
bounds. We also prove a concentration inequality for sums of random matrices of 
rank one with explicit constants. 



1 Introduction 

This note mainly deals with estimates for the operator norm \\Z n \\ of random sums 



of deterministic Hermitian matrices A\, . . ., A n multiplied by random coefficients. Recall 
that a Rademacher sequence is a sequence {ej}" =1 of i.i.d. random variables with ei uniform 
over { — 1,-|-1}. A standard Gaussian sequence is a sequence i.i.d. standard Gaussian 
random variables. Our main goal is to prove the following result. 

Theorem 1 (proven in Section [3]) Given positive integers d,n G N, let A\, . . . , A n be 

deterministic d x d Hermitian matrices and {ej}™ =1 be either a Rademacher sequence or a 
standard Gaussian sequence. Define Z n as in (QP- Then for all p G [1, +oo), 



it 
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where 



C p = 



V 



dt 



i/p 



(< Cy/p for some universal c > 0). 



For d = 1, this result corresponds to the classical Khintchine inequalities, which give 
sub-Guassian bounds for the moments of Y^i=i e * a * • • • , o n G M). Theorem [1] is implicit 
in Section 3 of Rudelson's paper [11] , albeit with non-explicit constants. The main Theorem 
in that paper is the following inequality, which is a simple corollary of Theorem [TJ if 
Yi, . . . ,Y n are i.i.d. random (column) vectors in C d which are isotropic (i.e E [ViX*] = /, 
the d x d identity matrix), then: 



E 



- V YiY* - I <CE \\Ytf*«\ Vlosn J!?** 
n ' L J V n 



(2) 



for some universal C > 0, whenever the RHS of the above inequality is at most 1. This 
important result has been applied to several different problems, such as bringing a convex 
body to near-isotropic position |llj ; the analysis of for low-rank approximations of matrices 
[T2"l [6] and graph sparsification [13]; estimating of singular values of matrices with inde- 
pendent rows [in]; analysing compressive sensing [3]; and related problems in Harmonic 
Analysis pUCE]. 

The key ingredient of the original proof of Theorem [1] is a non- commutative Khintchine 
inequality by Lust-Picard and Pisier [9]. This states that there exists a universal c > 
such that for all Z n as in the Theorem, all p > 1 and all d x d matrices {Bi, -Dj}™ =1 with 
Bi + D { 



Ai, 1 < i < n, 



El 



l-^nllgp 



1/p 



i=l 



1/2 



Sp 



i=l 



1/2 



where || • \\sp denotes the p-th Schatten norm: \\A\\ P SP = Tr[(A*A) p / 2 }. Unfortunately, the 
proof of the Lust-Picard/Pisier inequality employs language and tools from non-commutative 
probability that are rather foreign to most potential users of (]2]). 

This note presents an elementary proof of Theorem [1] that bypasses the above inequal- 
ity. Our argument is based on an improvement of the methodology created by Ahlswede 
and Winter [2] in order to prove their operator Chernoff bound, which also has many appli- 
cations e.g. [7] (the improvement is discussed in Section I3TT|) . This approach only requires 
elementary facts from Linear Algebra and Matrix Analysis. The most complicated result 
that we use is the Golden-Thompspon inequality [5j [TJ]: 



Vd e N, V d x d Hermitian matrices A, B, Tr(e A+ ^) < Tr(e A e 



(3) 
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The elementary proof of this classical inequality is sketched in Section below. 

We have already noted that Rudelson's bound d2J) follows simply from Theorem [TJ see 
[TTj Section 3] for detais. Here we prove a concentration lemma corresponding to that 
result under the stronger assumption that |Y"i| is a.s. bounded. While similar results have 
appeared in other papers [TQl [121 EE] , our proof is simpler and gives explicit (albeit quite 
large) constants. 

Lemma 1 (Proven in Section 3]) Let Y\, . . . , Y n be i.i.d. random column vectors in C d 
with \Yi\ < M almost surely and \\E\YxY*] \\ < 1. Then: 



Vt > 0, P 



n 



i=i 



In particular, a calculation shows that: 



>t\< (2n) e ibm^+sm 2 



1 n 

i=l 



< e(n,M) = M 



72 Inn + 48 In 2 



n 



with probability > 1 — 



1 



n 



whenever e(n,M) < 1. A key feature both of this Lemma is that the ambient dimension d 
plays no direct role in the bound. In fact, the same result holds for Yi taking values in a 
separable Hilbert space (as in the last section of [TO]). 

To conclude the introduction, we present an open problem: is it possible to improve 
upon Rudelson's bound under further assumptions? There is some evidence that the depen- 
dence on ln(d) in the Theorem, while necessary in general [12] Remark 3.4], can sometimes 
be removed. For instance, Adamczak et al. [TJ have improved upon Rudelson's original 
application of Theorem [TJ to convex bodies, obtaining exactly what one would expect in 



the absence of the y/log(2d) term. Another setting where our bound is a ^Vhi dj factor 

away from optimality is that of more classical random matrices (cf. the end of Section 13.11 
below). It would be interesting if one could sharpen the proof of Theorem [TJ in order to 
reobtain these results. [Related issues are raised by Vershynin [T7] .] 



2 Preliminaries 

We let C^e^m denote the set of d x d Hermitian matrices, which is a subset of the set C dxd 
of all d x d matrices with complex entries. The spectral theorem states that all A 6 
have d real eigenvalues (possibly with repetitions) that correspond to an orthonormal set 
of eigenvectors. X max (A) is the largest eigenvalue of A. The spectrum of A, denoted by 
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spec(A), is the multiset of all eigenvalues, where each eigenvalue appears a number of times 
equal to its multiplicity. We let 

||C|| = max \Cv\ 
vec d \v\=i 

denote the operator norm of C G <C dxd (| ■ | is the Euclidean norm). By the spectral theorem, 

VA G C£ x cr t, \\A\\ =max{A max (A),A max (-A)}. 
Moreover, Ty(A) (the trace of A) is the sum of the eigenvalues of A. 

2.1 Spectral mapping 

Let / : C — > C be an entire analytic function with a power-series representation f(z) = 
J2 n >o Cn2;n £ C). If all c n are real, the expression: 

n>0 

corresponds to a map from Cg^r m to itself. We will sometimes use the so-called spectral 
mapping property: 

specf(A) = /(spec(A)). (4) 

By this we mean that the eigenvalues of f(A) are the numbers /(A) with A G spec (A). 
Moreover, the multiplicity of £ G spec/ (A) is the sum of the multiplicities of all preimages 
of £ under / that lie in spec (A). 

2.2 The positive-semidefinite order 

We will use the notation A y to say that A is positive-semidefinite, i.e. A G and 
its eigenvalues are A are non-negative. This is equivalent to saying that (v, Av) > for all 
v G C d , where (•, ••) is the standard Euclidean inner product. 

If A, B G C^ m , we write A y B or B ■< A to say that A - B y 0. Notice that "h" is 
a partial order and that: 

VA, 5, A', 5' G C^ r d m , (A r< A') A (5 ^ B') A + A' ■< B + B'. (5) 

Moreover, spectral mapping (pEJ) implies that: 

^ G C^ m , A 2 y 0. (6) 

We will also need the following simple fact. 
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Proposition 1 For all A,B,C G C^J m : 

(C b 0) A {A r< 5) Tr(AC) < Tr(5C). (7) 

Proof: To prove this, assume the LHS and observe that the RHS is equivalent to Tr(CA) > 
where A = B — A. By assumption, A y 0, hence it has a Hermitian square root A 1 / 2 . 
The cyclic property of the trace implies: 

Tr(CA) = T^A^CA 1 / 2 ). 

Since the trace is the sum of the eigenvalues, we will be done once we show that A 1//2 CA 1 / 2 y 
0. But, since A 1//2 is Hermitian and C >z 0, 

Vv G C d , (v, A 1/2 CA 1/2 i;) = ((A 1/2 v),C(A 1/2 v)) = (w,Cw) > (with w = A^v), 

which shows that A 1 / 2 CA 1 / 2 >z 0, as desired. □ 

2.3 Probability with matrices 

Assume (fi, J 7 , P) is a probability space and Z : f2 — > C H g rm is measurable with respect 
to J 7 and the Borel cr-field on (this is equivalent to requiring that all entries of Z 

be complex- valued random variables). is a metrically complete vector space and 

one can naturally define an expected value E [Z] G C^m- This turns out to be the matrix 
E [Z] G Cg^ m whose (i, j)-entry is the expected value of the (i,j)-th entry of Z. [Of course, 
E \Z\ is only defined if all entries of Z are integrable, but this will always be the case in 
this paper.] 

The definition of expectations implies that traces and expectations commute: 

Tr(E [Z]) — E [Tr(Z)] . (8) 
Moreover, one can check that the usual product rule is satisfied: 

If Z, W : n C^ r d m are measurable and independent, E [ZW] = E [Z] E [W] . (9) 

3 Proof of Theorem H 

Proof: [of Theorem [T] We wish to control the tail behavior of: 

\\Zn\\ — m ax{A max (Z n ), A max (— Z n )}. 
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However, Z n and — Z n have the same distribution. It follows that: 

Vt > 0, P(||Z n || >t) < 2P(A max (Z„) > t). 
The usual Bernstein trick implies that for all t > 0, 

Vt > 0, P (A max (Z n ) > t) < inf e- st E [e sW . 

The function "x i— >■ e sx " is monotone non- decreasing and positive for all s > 0. It follows 
from the spectral mapping property (jl]) that for all s > 0, the largest eigenvalue of e sZn is 
e sA max (z n ) anc j a ^ e ig en values of e sZn are non-negative. Using the equality "trace = sum of 
eigenvalues" implies that for all s > 0, 

E [ e sAmax(z " } ] = E [A max {e sZn )] < E [Tr (e sZ ")] . 

As a result, we have the inequality: 

Vt > 0, P {\\Z n \\ > t) < 2 inf e~ st E [Tr (e sZ ")j . (10) 

Up to now, our proof has followed Ahlswede and Winter's argument. The next lemma, 
however, will require new ideas. 

Lemma 2 For all s6l, 

E [Tr(e sZ ™)] < Tr 

This lemma is proven below. We will now show how it implies Rudelson's bound. Let 



2 v n ,2 



a 2 



1=1 



Amax I A] 



^2 

i=l 



[The second inequality follows from Y17=i ^1 — 0> "which holds because of fl5]) and (|6]).] We 
note that: 

Tr e 2 < d A max e 2 = d e 2 



where the equality is yet another application of spectral mapping (@| and the fact that 
u x 1 — y e s X//2 " is monotone increasing. We deduce from the Lemma and ffTUl) that: 



Vt > 0, P(||Z n || > t) < 2d inf e~ st+ ^ = 2cte~^. (11) 

s>0 
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This implies that for any p > 1, 



\z n 



t p_1 P (j|Z„|| > (y/2\n(2d)+t)cj dt 



(useJIIJ) < 2prf / t^e 



+00 



+00 



(t + y^2 1n(2d)) 2 



■ t^+21n(2d) 

< 2prf / t p ~ l e * dt = C£ 



eft 



Since < ||Z n || < a/2 ln(2<f)cr + (||Z n || — a/2 ln(2d)cr) + , this implies the L p estimate in the 
Theorem. The bound "C p < c v /p" is standard and we omit its proof. □ 

To finish, we now prove Lemma El 
Proof: [of Lemma [2] Define D = ^=1 ^i/ 2 and 



i=i ^ ' 



;i<J<n). 



We will prove that for all 1 < j < n: 

E [Tr (exp (Dj))] < E [Tr (exp (Dj_i))] 



;i2) 



Notice that this implies E [Tr(e Dn )j < E [Tr(e Do )], which is the precisely the Lemma. To 
prove (fl2|) . fix 1 < j < n. Notice that is independent from sejAj — s 2 Aj/2 since the 
{ei}2 =1 are independent. This implies that: 



E[Tr(exp(D i ))] = E 
(use Golden-Thompson (j3J)) < E 



Tr ( exp ( -Dj-i + sejA 3 



Tr ( exp (Dj_i) exp ( sejAj — 



(Tr(-) and E [■] commute, (ED) = Tr E 



exp (Dj_i) exp I se^A, 



(use product rule, ©) = Tr E [exp (Dj^)} E 



exp stjAj — 



s*A* 



By the monotonicity of the trace (J7|) and the fact that exp (Dj_i) >z (which follows 
from (HI)), we will be done once we show that: 



E 



exp ( sejAj — 



-< I. 



(13) 



The key fact is that stjAj and — s 2 A 2 /2 always commute, hence the exponential of the sum 

is the product of the exponentials. Applying ([9]) and noting that e~ s A ^ 2 is constant, we 
see that: 

s 2 AV 



E 



exp SEjAj — 



E [exp (sejAj)] e 



In the Gaussian case, an explicit calculation shows that E [exp (stjAj)] = e s A ^ 2 } hence 
( |T3l) holds. In the Rademacher case, we have: 



E [exp (sejAj)] e 



where f(z) = cosh(sz)e~ s2z2 ^ 2 . It is a classical fact that < cosh(a;) < e x2 ^ 2 for all x e K. 
(just compare the Taylor expansions); this implies that < /(A) < 1 for all eigenvalues of 
Ay Using spectral mapping (TJJ, we see that: 

specf(Aj) = /(spec^-)) C [0, 1], 

which implies that f(Aj) ^ /. This proves ffTH]) in this case and finishes the proof of (TT21) 
and of the Lemma. □ 



3.1 Remarks on the original AW approach 

A direct adaptation of the original argument of Ahlswede and Winter [2] would lead to an 
inequality of the form: 



One sees that: 



E [Tr(e sZ ")j < Tr (E [ e stnAn ] E [e'^*" 1 ]) . 



E [e senAn ] r< e V 1 -< e ^V^ J. 



However, only the second inequality seems to be useful, as there is no obvious relationship 
between 



Tr ( eVg [e^- 1 ] 



and 



Tr ^E [ e se "-i^-i] E 



, 2 4 2 
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which is what we would need to proceed with induction. [Note that Golden-Thompson ([3]) 
cannot be undone and fails for three summands, [33].] The best one can do with the second 
inequality is: 



E [Tr(e sZ ' 1 )] < de 

This would give a version of Theorem [1] with Y17=i 11^* II 2 replacing || Yli=i 411- This mod- 
ified result is always worse than the actual Theorem, and can be dramatically so. For 
instance, consider the case of a Wigner matrix where: 

z»= E <%*> 

l<i<j'<m 

with the eij i.i.d. standard Gaussian and each Aij has ones at positions and (j, i) and 
zeros elsewhere (we take d = m and n = (™) in this case). Direct calculation reveals: 



(m - 1)I\\ = m - 1 < 



/// 



E 



We note in passing that neither approach is sharp in this CcLSCj clS || £V EyAij || concen- 
trates around 2y/m [I]. 



4 Concentration for rank-one operators 



In this section we prove Lemma [TJ 
Proof: [of Lemma [T] Let 



0(s) = E 
We will show below that: 



exp s 



1 n 
n 

i=l 



Vs > 0, <f>{ 8 ) < 2ne 2M2s2/n <f){2M 2 s 2 /n). (14) 
By Jensen's inequality, <p(2M s 2 / n) < <p{s) 2M ' 2s ^ n whenever 2M 2 s/n < 1, hence (TT4"]) implies: 



_1 2M^ 



V0 < s < n/2M 2 , <j)(s) < (2n)i- 2M ^/"e^^. 



Since 



Vs > 0, P 



1 n 

-^y^-EfYxY*] 



> t < e" s >(s), 



the Lemma then follows from the choice 

s 



In 



8M 2 + AMH 

and a few simple calculations. [Notice that 2M 2 s/n < 1/2 with this choice, hence 1/(1 
2M 2 s/n) < 2] 

To prove (1141) . we begin with symmetrization (see e.g. [5]): 



(s) < E 



exp 2s 



i n 

n 



where {ej}" =1 is a Rademacher sequence independent of Yi, . . . , Y n . Let S be the (random) 
span of Yi, . . . , Y n and Tr,s denote the trace operation on linear operators mapping S to 
itself. Following the argument in Theorem [TJ we notice that: 



E 



exp 2s 



Y\ , . . . , Y n 



< 2E 



2s 



Tr<Jexp -5>^* \ \ Y i>---> Y n 



i=l 



Lemma |2] implies: 



E 



exp 2s 



kt#x ) 

i=i J 



Y\ , . . . , Y n 



2s 2 



< 2Tr 5 exp — ^(F.r; 



< 2n exp 



2s 



i=l 



*w = |r 4 | 2 r 4 r.* -< m^y*, 



using spectral mapping (jl]), the equality "trace = sum of eigenvalues" and the fact that S 
has dimension < n. A quick calculation shows that ^ {YiY i 
hence ([5]) implies: 



i=i 



n \ rz 



i=i / 



Therefore: 

2s~ 



n- 



i=i 



< 



2M 2 s 2 



n 



n 

n ' 

i=l 



< 



2M 2 s 2 



n 



1 " 

- ^ YiY* - E [YiY* 

71 8=1 



2M 2 s 2 



n 



[We used ||E [Y^*] || < 1 in the last inequality] Plugging this into the conditional expec- 
tation above and integrating, we obtain ( fT4"l) : 



[s) <2nE 
□ 



exp 



2M^s 



2^2 



n 



1 n 

-^Y^-E^Y*] 

8=1 



+ 



2M 2 s 2 



2ne 2M2s2 /"0(2M 2 s 2 /n). 
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5 Proof sketch for Golden-Thompson inequality 

As promised in the Introduction, we sketch an elementary proof of inequality ([3]). We will 
need the Trotter-Lie formula, a simple consequence of the Taylor formula for e x : 

VA,B eC d u x J m , lim {e A/n e B/n ) n = e A+B . (15) 

The second ingredient is the inequality: 

Vk G N,VX, Y G C£ x er d m : X, Y h => Tr((XYf +1 ) < Tr((X 2 Y 2 f). (16) 

This is proven in of [5] via an argument using the existence of positive-semidefinite square- 
roots for positive-semidefinite matrices, and the Cauchy-Schwartz inequality for the stan- 
dard inner product over C dxd . Iterating ffTBT) implies: 

VX, Y G C^J m : X,Y h Tr((XYf) < Tr(X 2k Y 2k ). 

Apply this to X = e A/2k and Y = e B/2k with A, B G C^^i- Spectral mapping (gj) implies 
X, Y y and we deduce: 

Tr((e A / 2k e B / 2k f) <Tr(e A e B ). 
Inequality follows from letting k — > +oo, using (fl5|) and noticing that Tr(-) is continuous. 
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